1

When presented with two opposed sets of ideas, if one is interested to any degree in the question, it is natural to be biased towards one side. So it is for me regarding the question of behaviourism vs. whatever the opposite of that is. “Haha, those bloody behaviourists got utterly destroyed in the middle of the 20th century.” So satisfying to have that confirmed over and over when you read your preferred sources. This is human nature, of course. At another time, you might even find yourself sympathetic to the opposite viewpoint, when you are less emotionally invested, curious again, and pick up a book arguing for the “other side.”

As for what behaviourism is: it is the school of thought that argues you can’t really understand anyone else’s mind or psychology, and can therefore only “explain” them in terms of their behaviour. The opposite, then, would be something along the lines of explaining how another being operates — offering some sort of “mechanistic” explanation of their behaviour in terms of parts and causality.

I ended up identifying with this lineage perhaps when I read David Deutsch’s two books, The Beginning of Infinity and The Fabric of Reality. Deutsch is an adherent of Karl Popper, who made major contributions to the philosophy of science during the 20th century and whose declared enemy was Logical Positivism — which argued that knowledge could only be derived from what can be observed, a view Popper countered with the idea that knowledge is conjectural and can only be refuted, not verified. The conjecture part comes up again in Deutsch’s concept of the explanation, which precisely is about a system’s how. In short, you cannot predict a system’s behaviour from past observations; you have to have a theory of how the system internally works.^[1]

Now, the irony for me here was that at some point, in a moment of self-reflection I’ve realised that as a software engineer I’ve become pretty biased towards seeing the specification as in a sense more important than the code, for various reasons — which makes me pretty much a hardcore behaviourist here. Adding insult to injury, this realisation, painful as it is already for reasons of cognitive dissonance, puts me pretty much at the other end of the spectrum on this question from a sentiment expressed by Rich Hickey (our BDFL of Clojure, my programming language of choice) in his most famous talk. Here’s what he has to say:

I think we’re in this world, I like to call guardrail programming. Right. It’s really sad. We’re like, I can make change cause I have tests. Right? Who does that? Who drives their car around banging against the guardrails, saying whoa, I’m glad I’ve got these guardrails because I’d never make it to the show on time.^[2]

2

It is not an accident that people are interested in this topic at the present moment. The context is AI — of course. Since these things generate so fast, more thought from the engineering profession naturally goes in the direction of how to constrain them (maybe a little bit in contrast to the general audience, who tends to frame things more in terms of taste and judgment).^[3]

Accordingly, spec-driven development in all its variations is a topic of some relevance in the context of agentic engineering. One example from the Clojure sphere is the Allium specification language. The opening paragraph of the announcement post, From specification to stress test: a weekend with Claude, should give an impression of what people hope to achieve with this kind of approach:

Over a weekend, between board games and time with my kids, Claude and I built a distributed system with Byzantine fault tolerance, strong consistency and crash recovery under arbitrary failures. I described the behaviour I wanted in Allium, worked through the bugs conversationally and didn’t write a line of implementation code.^[4]

I think we are clear that these kinds of experiments — as they were fashionable in early 2026, like the “AI built a browser” — are not to be confused with day-to-day work, and require somewhat special circumstances. So it is not these grand claims, but rather the general idea that code can in principle be generated from specs, that faces the more intelligent scepticism at this point. The article A sufficiently detailed spec is code^[5] pretty much nails it with its title alone.

Although, it hits the nail on the head a bit too well. It’s true in the sense of a truism that doesn’t convey much information. For why do we have tests at all, then?

When I heard that Hickey quote, which didn’t seem to match my experience, I pondered what made him say it and thought it was perhaps a difference in perspective: he was mainly focused on creating library functions, which you can basically hold in your head as a single individual. Whereas my thinking was more about growing codebases that don’t focus on providing neat abstractions, but rather address ever-new requirements. Factor in multiple teams working on the same codebase, and at least one thing is clear: nobody can hold that thing in their head. But of course, Hickey would know this. And I think we should see that quote as a temperamental bias and pushback against silver-bullet thinking around testing at the time — much as the spec-is-code argument pushes back against the “we need only specs” crowd right now.

3

Specs/tests and code are two different things. That much is obvious on the face of it. But not only are tests not equivalent to code, as the spec-equals-code thinking would suggest — they are in fact deficient, as they cannot capture the full behaviour of the system under test in all but the most trivial cases. Look at how the sparsity of tests, in principle (in the sense of “insufficient” coverage), stands against the code that actually generates outputs from inputs: the code, representing “the formula,” is often much shorter in lines than the tests trying to pin it down. So I think it is in order to reconstruct an argument that gives a good perspective on what different purposes the two actually serve.

A couple of days ago I listened to an interview on the ClojureStream podcast with Tony Kay,^[6] who is the author of the in Clojure circles well-regarded Fulcro framework for data-driven web applications, seemed to appreciate testing from a methodological standpoint a lot. In fact, his runtime-checking library Guardrails (!)^[7] can probably be taken as a testament to that. In any case, he emphasised that there is always the tricky question of “what to test,” pointing to the combinatorial explosion that is part of the equation with functions of multiple or complex inputs and outputs. Testing is always example-based: you give example pairs of inputs and outputs, omitting all possible in-between values.

I wrote about this in a previous article, The Triangulation Method.^[8] To reiterate that point quickly, let’s say — for the sake of argument — you have a function that multiplies all values to the left of 4 by 5, squares inputs to the right of 4, and is undefined at 4. So what do you test here? The answer is that you need to test sufficient examples to the left of 4, then 4 itself, then sufficient examples to the right. And probably also the behaviour at 0 and for negative values.

(defn f [x]
  (cond (< x 4) (* x 5)
        (> x 4) (* x x)))

(deftest f-test
  (is (= 5   (f 1)))   ; left of 4
  (is (= nil (f 4)))   ; at 4
  (is (= 25  (f 5)))   ; right of 4
  (is (= 0   (f 0)))   ; zero
  (is (= -10 (f -2)))) ; negative

So, as one can see and as I’ve indicated above, one does not — in fact, one cannot — test the full range of outputs the function can generate. But then there is another, more important point for our discussion: in constructing the test, we seem to need a lot of knowledge about the function’s internals!

A function is not determined by tests — in much the same way that curve fitting allows any number of curves to match a given set of points. But what tests do for you, in my opinion, is give you a grip on complexity that mulling over code simply cannot. Thinking in specs and tests first of all lets you gather requirements ( although this in itself is often no small task). It lets you reach a point of disambiguation, when you ask: “OK, but let’s say we are here — what should the system respond with now?” Based on that, a developer can use their intuition, built on hard-won experience, to “see” — to conjecture, to use Popper’s word — the necessary case distinctions.

The argument here is that tests are there primarily to reduce cognitive load. It is for this reason that I commented on a junior developer’s pull request in the way I’ll describe. The task centred around sorting done by the database, and the test was verifying the correct sort order using sort-by, a Clojure core function. Now, looking at the test, a developer has to make an inference of sorts, however small, every time. I suggested: why not spell it out and plainly state the expectation — Alan, Bob, Tim, Tina.

Tests are for “telling a story.”

4

There is this idea, which I’ve always found a little funny, that there are cross-cutting concerns — because that implies there are concerns which are not cross-cutting. The way I see it, you start an application with major ideas and a major architectural orientation, and then further down the road things start not to fit in neatly, and you call these cross-cutting concerns. A likely counter would be: “yes, but I mean logging.” OK, but then the next version of your thing probably says that old way of handling that sucked, and now you proudly advertise that your new thing 2.0 treats logging as a first-class citizen! And let’s be real, it’s usually not only to things like logging which we limit that term to. Consider your CRUD app, which focuses on Type A and Type B entities, and the non-CRUD page you now need that deals with both — which ends up happening everywhere in your application anyways. These things always break down at some point, as every taxonomy ends up with “edge cases” that stubbornly refuse to fit.^[9]

A useful analogy in my mind is to see a system as something three-dimensional and tests as something two-dimensional — and you happen to unfortunately inhabit 2D space in this picture. Narratives help you manage complexity by focusing on aspects you can work your way through mentally in a linear fashion — one at a time.^[10] Which is, when you think about it, similar to a behaviourist’s resignation: we live on the outside of the system, and we make our peace with that by telling stories about how it behaves. Not exactly glorious.

And I think that’s a little bit what is at the heart of the issue. Now that LLMs can infer the case distinctions from tests — to put it within the framing established above — there is a fear that human developers will be devalued. Two interesting artefacts I found speaking to this were a YouTube video titled I was a 10x engineer. Now I’m useless.^[11] by Mo Bitar, and an article titled I Sold Out for $20 a Month and All I Got Was This Perfectly Generated Terraform.^[12] The former speaks to the loss of relationship with the generated code. It is just too easy, in an inflationary sense — you no longer value it. Think of how your grandparents kept certain things with a “for bad days” attitude, whereas you throw yours away and buy the next item as soon as you find the next-best excuse.

The article then approaches this from a perspective where the author puts himself in the shoes of a product manager, basically. I mean, we all learn at some point in our careers that it’s about the business value, not the code. In fact, this is seen as a mark of real maturity in a developer. But at heart, we are developers and we care, and when we see tech debt piling up, our hearts bleed. We say this is because of anticipated problems, but equally often, I suspect it is because of a sense of beauty violated. In both artefacts, there is some serious struggle going on, and I congratulate both authors for their honesty.

Apart from the fact that it is probably healthy to pull ourselves out of myopic attachments from time to time, this shift leaves a lot of room for expression — both as programmers showing how much logic we can handle, and as ideators who can make machines do things for us. The latter, in a sense, brings us closer to laymen, who “know what they like when they see it.”

And if that doesn’t appeal to you, consider that things are fractal: it simply shifts you up a level, to architectural beauty. Can’t you see it in compartmentalising your architecture in ways that prevent failures from affecting unrelated subsystems? Here we are right back at the point of judging the internals of the system (Deutsch’s “explanation”), in the precise sense I outlined earlier: a function is never fully determined by its tests — nor is a system by its specs.

As engineers, we are proud of our tools and love our objects, but what we should really take pride in is our methods. Whatever you can bring a machine to do for you, engineers are the ones who can control it beyond the point a layman ever could. In the same vein: there was a lot of chatter recently about how one person can now start a billion-dollar business. Well — can’t ten people leveraging that same tech still accomplish more? So in short: I don’t think anyone will take anything from anyone. Long-term, that is. What I do agree with is that change is difficult.

Footnotes

For example, with regards to Artificial General Intelligence, he argues we can only build it when we understand what intelligence is (but trying to build it furthers our understanding, of course). This echoes Feynmann: “What I cannot build. I do not understand.”
Simply Made Easy from 2011 (at 16:08, see YouTube).
But the common theme remains, as this article, Rick Rubin Is the Future of Work argues with the work method of the music producer who has become an icon of this idea, “He can’t do the how. He is the undisputed master of the what.”
The article can be found here: juxt.pro/blog
Find it here: haskellforall.com. Another article, Reports of code’s death are greatly exaggerated (source: stevekrouse.com), referencing the first, continues to argue that case.
E103 - LLM Experience report with Tony Kay; link.
Can be found here on GitHub: fulcrologic/guardrails
Link
At which point I always cite my favourite paper, paleontologist Stephen Jay Gould’s 1983 What, If Anything, Is a Zebra? (online source: link).
I have a feeling that Donald Knuth’s “literate programming” idea was born out of that very intuition.
YouTube
matduggan.com

Guardrails Programming

1

2

3

4

Footnotes

# Comments